Analysis of the online retail store in the year 2011

The dataset we are using comes from UCI Machine Learning repository. The dataset is called “Online Retail” and can be found in http://archive.ics.uci.edu/ml/datasets/online+retail.

In [1]:
import function as f 
In [2]:
#read dataset: 

retail_db = f.read_dataset()
retail_db.head()
Out[2]:
invoice_id stock_code description quantity invoice_date unit_price customer_id country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom

After pre-processing, the dataset includes 379.979 records and 17 fields:

invoice_id, stock_code,

description, quantity,

invoice_date, unit_price,

customer_id, country,

order_status, sales(£),

sales(£)_abs, quantity_abs,

time, hour, date, month, year

In [3]:
#data pre-processing:
RETAIL = f.data_pre_processing(retail_db)
RETAIL.head()
Out[3]:
invoice_id stock_code description quantity invoice_date unit_price customer_id country order_status sales(£) sales(£)_abs quantity_abs time hour date month year
26850 539993 22386 JUMBO BAG PINK POLKADOT 10 2011-01-04 10:00:00 1.95 13313.0 United Kingdom Approved 19.5 19.5 10 10:00:00 10 2011-01-04 2011-01 2011
26851 539993 21499 BLUE POLKADOT WRAP 25 2011-01-04 10:00:00 0.42 13313.0 United Kingdom Approved 10.5 10.5 25 10:00:00 10 2011-01-04 2011-01 2011
26852 539993 21498 RED RETROSPOT WRAP 25 2011-01-04 10:00:00 0.42 13313.0 United Kingdom Approved 10.5 10.5 25 10:00:00 10 2011-01-04 2011-01 2011
26853 539993 22379 RECYCLING BAG RETROSPOT 5 2011-01-04 10:00:00 2.10 13313.0 United Kingdom Approved 10.5 10.5 5 10:00:00 10 2011-01-04 2011-01 2011
26854 539993 20718 RED RETROSPOT SHOPPER BAG 10 2011-01-04 10:00:00 1.25 13313.0 United Kingdom Approved 12.5 12.5 10 10:00:00 10 2011-01-04 2011-01 2011

With the pre-processing data let's go deep into the store behavior during the year 2011.

In [5]:
#finance and sales report:

FINANCE_REPORT, SALES_REPORT = f.finance_and_sales_report(RETAIL)

#customer and product behaviors:

COUNTRY_TOP10, TIME, RETURNED, REPURCHASE_RATE, PROD_TOP10, SALES_TOP10, ITEM_INVOICE = f.customer_and_product_behavior(RETAIL)

#Charts:

SALES_CHT_LINE, TIME_HIST, SALES_TOP10_CHT_BAR, PROD_TOP10_CHT_BAR, ITEM_INV_HIST = f.charts(SALES_REPORT, TIME, SALES_TOP10, PROD_TOP10,ITEM_INVOICE)

REPORT

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.

The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

The Year of 2011, this store has 8.9 M£ in revenue and sold almost 5M in products. Presented in this year an average ticket of 23.51£.

Below is possible to see the financial report.

In [6]:
FINANCE_REPORT
Out[6]:
order_status quantity_abs(100k) sales(£100k) customer_id avg_ticket
0 Approved 48.69 83.39 371764.0 22.43
1 Cancelled 2.59 5.93 8215.0 72.21
2 Total 51.28 89.32 379979.0 23.51

In a way to study the financial behavior of this company, it was possible to separate the revenue and products volumes along the year month by month.

Comparing the period of September to November it is possible to identify that revenue and volumes increased almost 50% compared to the first semester. This behavior might be justified by the end-year parties and the promotional dates such as Black Friday in November. An indication of that might be the sales decrease in December.

In [7]:
SALES_CHT_LINE

As online retail, the stores have registered sales in 36 countries. Although the store sells worldwide, its main market is the United Kingdom responsible for 81% (approx £6M) of sales in 2011.

In [8]:
COUNTRY_TOP10
Out[8]:
country quantity sales(£k) sales(%)
34 United Kingdom 3755721 6284.07 81.13
22 Netherlands 193317 275.88 3.56
10 EIRE 131471 242.46 3.13
14 Germany 110725 207.14 2.67
13 France 104870 187.14 2.42
0 Australia 83199 136.07 1.76
31 Switzerland 29064 54.43 0.70
29 Spain 25957 52.93 0.68
3 Belgium 21397 39.10 0.50
30 Sweden 31923 33.95 0.44

There are many ways to validate customer fidelity, one of which is the repurchase rate.

The repurchase rate of that store is 98.23% that means that the customers return to buy after the first time.

as we can see with a 2.16% of the invoices with status canceled (products returns) the satisfaction with the products is high also so, probably, the main reason for these returns is for give-up or it didn't attend the customers expectations.

In [9]:
print('Repurchases rate {}%.'.format(REPURCHASE_RATE).upper())

RETURNED
REPURCHASES RATE 98.23%.
Out[9]:
order_status quantity_abs invoice_id sales(£k) sales(%) quantity(%) invoice(%)
0 Approved 4869416 371764 83.39 93.36 94.95 97.84
1 Cancelled 258890 8215 5.93 6.64 5.05 2.16

Now that we know most of the customers as fidelity to the store we can analyze the top 10 products behaviors, in what time the customers used to buy, and how many items are ordered by invoices. This kind of information could give some insight to raise the sales.

There is a bias between the hour of the day and order volume. With the histogram, we could conclude that most orders happened between 10:00 - 15:00. So if the company is going to do a promotional voucher, probably, must avoid this time and do during the time people don't use to buy such as between 6 AM - 9 AM or 5 PM - 7 PM to increase the number of orders during this hours.

In [10]:
TIME_HIST

Invoices with C in the register are considered a cancel order. ​so we consider it as a return. We could conclude that people mostly purchase less than 10 items in each invoice

In [11]:
ITEM_INV_HIST

below it is possible to see the top 10 products that brought more revenue and also the top 10 products most sold.

In [12]:
SALES_TOP10_CHT_BAR
In [13]:
PROD_TOP10_CHT_BAR